OcrV1, Main, Exploration, bibRecord, 000775

Local features-based script recognition from printed bilingual document images

Identifieur interne : 000775 ( Main/Exploration ); précédent : 000774; suivant : 000776

Local features-based script recognition from printed bilingual document images

Auteurs : S. Abirami [Inde] ; D. Manjula [Inde]

Source :

International journal of computer applications in technology [ 0952-8091 ] ; 2010.

RBID : Pascal:11-0056451

Descripteurs français

Pascal (Inist)
- Reconnaissance forme, Reconnaissance caractère, Reconnaissance optique caractère, Analyse documentaire, Langage naturel, Texte, Traitement image, Grille, Arbre décision, Document imprimé, Multilinguisme, Bilinguisme, Classification hiérarchique, Alphabet, Modélisation, Evaluation performance, 52477.
Wicri :
- topic : Multilinguisme, Bilinguisme.

English descriptors

KwdEn :
- Alphabet, Bilingualism, Character recognition, Decision tree, Document analysis, Grid, Hierarchical classification, Image processing, Modeling, Multilingualism, Natural language, Optical character recognition, Pattern recognition, Performance evaluation, Printed document, Text.

Abstract

Classification and identification of language in a biscript document is one of the important steps in the design of an OCR system for successful analysis and recognition. This paper presents architecture for script recognition of bilingual document images (Tamil, English), which specifically takes the challenges of recognition at character level by predicting the script of word image using its initial character, thereby adapting to various font faces and sizes. This recogniser models every character as Tetra bit values (TBV), which corresponds to the spatial spread, derived from the segmented grids of the character. We employed a decision tree classifier (DTC) for the classification of script on over the patterns generated from TBV. A spatial features-based script recogniser (SFBSR) is trained and tested with bilingual document images, consisting of various Tamil and English words, to show its effectiveness towards script identification. Classification accuracy in training and testing sets is promising. An evaluation of the system performance with various techniques shows a significant performance improvement in SFBSR. This can be embedded with OCR prior to its recognition stage.

Affiliations:

Inde

Links toward previous steps (curation, corpus...)

to stream PascalFrancis, to step Corpus: 000153
to stream PascalFrancis, to step Curation: 000620
to stream PascalFrancis, to step Checkpoint: 000149
to stream Main, to step Merge: 000780
to stream Main, to step Curation: 000775

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Local features-based script recognition from printed bilingual document images</title>
<author><name sortKey="Abirami, S" sort="Abirami, S" uniqKey="Abirami S" first="S." last="Abirami">S. Abirami</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, Anna University</s1>
<s2>Guindy, Chennai 600 025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Guindy, Chennai 600 025</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Manjula, D" sort="Manjula, D" uniqKey="Manjula D" first="D." last="Manjula">D. Manjula</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, Anna University</s1>
<s2>Guindy, Chennai 600 025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Guindy, Chennai 600 025</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">11-0056451</idno>
<date when="2010">2010</date>
<idno type="stanalyst">PASCAL 11-0056451 INIST</idno>
<idno type="RBID">Pascal:11-0056451</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000153</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000620</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000149</idno>
<idno type="wicri:doubleKey">0952-8091:2010:Abirami S:local:features:based</idno>
<idno type="wicri:Area/Main/Merge">000780</idno>
<idno type="wicri:Area/Main/Curation">000775</idno>
<idno type="wicri:Area/Main/Exploration">000775</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Local features-based script recognition from printed bilingual document images</title>
<author><name sortKey="Abirami, S" sort="Abirami, S" uniqKey="Abirami S" first="S." last="Abirami">S. Abirami</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, Anna University</s1>
<s2>Guindy, Chennai 600 025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Guindy, Chennai 600 025</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Manjula, D" sort="Manjula, D" uniqKey="Manjula D" first="D." last="Manjula">D. Manjula</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, Anna University</s1>
<s2>Guindy, Chennai 600 025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Guindy, Chennai 600 025</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">International journal of computer applications in technology</title>
<title level="j" type="abbreviated">Int. j. comput. appl. technol.</title>
<idno type="ISSN">0952-8091</idno>
<imprint><date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">International journal of computer applications in technology</title>
<title level="j" type="abbreviated">Int. j. comput. appl. technol.</title>
<idno type="ISSN">0952-8091</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Alphabet</term>
<term>Bilingualism</term>
<term>Character recognition</term>
<term>Decision tree</term>
<term>Document analysis</term>
<term>Grid</term>
<term>Hierarchical classification</term>
<term>Image processing</term>
<term>Modeling</term>
<term>Multilingualism</term>
<term>Natural language</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Performance evaluation</term>
<term>Printed document</term>
<term>Text</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Reconnaissance forme</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Analyse documentaire</term>
<term>Langage naturel</term>
<term>Texte</term>
<term>Traitement image</term>
<term>Grille</term>
<term>Arbre décision</term>
<term>Document imprimé</term>
<term>Multilinguisme</term>
<term>Bilinguisme</term>
<term>Classification hiérarchique</term>
<term>Alphabet</term>
<term>Modélisation</term>
<term>Evaluation performance</term>
<term>52477</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Multilinguisme</term>
<term>Bilinguisme</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Classification and identification of language in a biscript document is one of the important steps in the design of an OCR system for successful analysis and recognition. This paper presents architecture for script recognition of bilingual document images (Tamil, English), which specifically takes the challenges of recognition at character level by predicting the script of word image using its initial character, thereby adapting to various font faces and sizes. This recogniser models every character as Tetra bit values (TBV), which corresponds to the spatial spread, derived from the segmented grids of the character. We employed a decision tree classifier (DTC) for the classification of script on over the patterns generated from TBV. A spatial features-based script recogniser (SFBSR) is trained and tested with bilingual document images, consisting of various Tamil and English words, to show its effectiveness towards script identification. Classification accuracy in training and testing sets is promising. An evaluation of the system performance with various techniques shows a significant performance improvement in SFBSR. This can be embedded with OCR prior to its recognition stage.</div>
</front>
</TEI>
<affiliations><list><country><li>Inde</li>
</country>
</list>
<tree><country name="Inde"><noRegion><name sortKey="Abirami, S" sort="Abirami, S" uniqKey="Abirami S" first="S." last="Abirami">S. Abirami</name>
</noRegion>
<name sortKey="Manjula, D" sort="Manjula, D" uniqKey="Manjula D" first="D." last="Manjula">D. Manjula</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000775 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000775 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:11-0056451
   |texte=   Local features-based script recognition from printed bilingual document images
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Local features-based script recognition from printed bilingual document images

Local features-based script recognition from printed bilingual document images

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri